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Abstract 

We numerically test the method of non-sequential recursive pair 
substitutions to estimate the entropy of an ergodic source. We com- 
pare its performance with other classical methods to estimate the en- 
tropy (empirical frequencies, return times, Lyapunov exponent). We 
considered as a benchmark for the methods several systems with differ- 
ent statistical properties: renewal processes, dynamical systems pro- 
vided and not provided with a Markov partition, slow or fast decay of 
correlations. Most experiments are supported by rigorous mathemat- 
ical results, which are explained in the paper. 

1 Introduction 

We investigate a symbolic substitution method as a tool to estimate entropy 
of an ergodic source. The entropy we deal with is the Shannon entropy of 
finite-alphabet stationary stochastic processes, in particular those that can 
be obtained as a symbolic model of a dynamical system. 

Throughout the paper, we shall refer to this method as Non-Sequential 
Recursive Pair Substitution (NSRPS). The idea of applying recursive pair 
substitutions to symbolic sequences was first proposed by Jimenez-Montaiio, 
Ebeling and others (see |E JM] ) . but it was put into the formal context of 
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probability theory and studied more deeply by Grassberger [Grj in 2002 and 
Benedetto, Caglioti, Gabrielli [BUU] in 2006. 

We now briefly explain how the NSRPS method works. 

Let us suppose to have a finite-state stationary source, that is a device 
providing infinite sequences of symbols where each Xi is an element 

of a finite alphabet A, in such a way that the probability of receiving a given 
finite string does not vary with time. Given a sequence from such a source, 
the NSRPS method prescribes to individuate the pair (or one of the pairs) 
of symbols of maximal frequency and to substitute all its non-overlapping 
occurrences with a new symbol a ^ A. For example, given the sequence 

011010111011000111011010011 . . . , 

taken from a source yU for which yu(Ol) is the highest among the probabilities 
of symbol pairs, we substitute the pair 01 with the new symbol 2, thus 
obtaining 

2122112100211212021.... 

In the case the pair to substitute is made up of two equal symbols, not all 
the occurrences are to be substituted, but only the non-overlapping ones. 
For example, given the sequence 

00110100001010001000001100001 . . . , 

we substitute the pair 00, obtaining 

211012210120122011221.... 

Starting from a source /i with alphabet A = {0, 1}, after the first substi- 
tution we shall have a new source with alphabet Ai = {0, 1, 2} and a measure 
fii on the finite strings inherited from fi. We can then go on repeating the 
steps, introducing new symbols 3,4,... and obtaining new sources /i2, /is, . . .. 

The main theorem about the NSRPS method (Theorem 12.41) says that 
the entropy h of an ergodic source /i, which is defined by 

h{fi) = lim -J V /i(x)log2/i(x), 

lcngth(a;)=fc 

can be calculated, in the limit for the number of substitutions which 
approaches infinity, knowing only the probabilities according to of the 
individual symbols and of the pairs in the new sources, after many substi- 
tutions. We remark that the hypotheses of substituting at each step one 
of the pairs with the maximum probability is a sufficient but not necessary 
condition for the conclusion of the main theorem 12.41 to hold (see |BCG] ). 
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Numerical results about the use of this method for the estimation of 
the entropy of the english language were sketched in [Grj . Here we show 
a first systematic comparation of this method with other classical ones, by 
performing several experiments on artificial sequences. We will mainly use 
symbolic sequences constructed by dynamical systems. 

The use of symbolic models of dynamical systems as a benchmark for this 
kind of study is motivated by the following two important features: 

• dynamical systems can produce strings with many kinds of nontrivial 
statistical features (slow decay of correlations, no Markov structure, 
and so on...) 

• the dynamical/geometrical properties of the system under considera- 
tion often allow the entropy of the system to be estimated (sometime 
rigorously calculated) by some other method (Lyapunov exponents and 
geometrical properties of the invariant measure e.g.) whose results can 
be compared with the estimation done by symbolic methods. 

In order to judge the precision and the speed of the entropy estimating 
algorithm suggested by the NSRPS method, we shall compare it with other 
three much used entropy estimating methods. Two of them apply to symbolic 
sequences. They are the empirical frequencies method and the return times 
method. Finally, in the case of ergodic transformations, we calculate the 
Lyapunov exponent which converges very fast and will be considered as a 
reference value for the entropy. The use of these numerical estimators will 
be supported by rigorous mathematical results, which will be explained in 
the paper. 

In section [2] we formally present the NSRPS method and state the main 
theorem about it. In section [3] we recall some basic notions of symbolic 
dynamics. In section H] we give a review of rigorous results supporting the 
estimation of entropy by the other methods we chose: empirical frequencies, 
return times and Lyapunov exponent. In section [5] we discuss the details of 
the implementation of the above methods and the reasons of some arbitrary 
choice we could not avoid. In section [6] we present the experimental results, 
with some tables and figures. 

2 Non-Sequential Recursive Pair Substitutions 
(NSRPS) 

In this section we briefly recall from [BCG] definitions and main results on 
the NSRPS method. We introduce the terms and the notations which are 
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fundamental to state the main theorem 12 .41 We omit all the technical details 
and the proofs, which the interested reader can find in [BCGj . 

We recall from the introduction that the method we study is applied to 
symbolic sequences which are supposed to come from a finite-state stationary 
source. 

Let us call our finite alphabet A and denote with A* = W^^-^^A^ the 
collection of all finite words in the alphabet A. A word w & A* has length 
\w\ and, if \w\ = k, it will also be indicated with = wi . . . Wk- 

Let x,y & A, a ^ A and Ai = AU {a}. 

Definition 2.1. 

A pair substitution is a function G = : A* ^ A\ which is defined by 
recursively substituting all the non- overlapping occurrences of the pair xy. 
More precisely, Gui is defined substituting in w the first occurrence from left 
of xy with a and repeating this procedure to the end of the sequence. 

We consider again the example sketched in the introduction and show 
some general notation. Given the sequence 

w = 011010111011000111011010011 e {o, i}*, 

performing the substitution 01 t-^ 2 leads to 

Gli{w) = 2122112100211212021 G {0,1,2}*. 

We indicate with S{A) the set of all the stationary ergodic measures 
on A^, the only ones we shall deal with. If yU G S{A) and w_ & A*, we 
shall use the notation fi{w) to indicate the /x-measure of the cylinder set 
[wi, . . . ,Wk] = n^^i{Xi = Wi}, where the Xj's are the random variables 
which describe the stochastic process. 

The map G = G"y naturally induces a map Q = Q^y : £{A) — > £{Ai), 
as the following theorem shows. We indicate with tl{s C r} the number of 
occurrences of a subword s in a word r. 

Theorem 2.2. If ^ E £{A) and s G A\, then the limit 



Ql^i.s) = lim 



ra— >oo 



|GK)| 



exists and is constant fi almost everywhere in W- Furthermore, the values 
{^/i(s)}sg^* are the marginals of an ergodic measure on Af. 

It is obvious that a pair substitution shortens the sequence it is applied to. 
The following proposition gives an average quantification of this shortening. 
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Proposition 2.3. If x ^ y then 



=^ lim ^ = — — - (/i a. e. in w). (2.1) 

n-.oolGiw'Dl l-iJ,{xy) ^ ' ^ ' 



If X = y then 



S. f Til 1 



where x is the string made up of k symbols x. 

We now recall the definition of entropy of a process. 
Given fi G S{A) and n > 1, the quantity 

Hnifi) = - ^ At(w)log2Ai(w) 

\w\=n 

is the n-i/i order entropy. 

The n-t/i order conditional entropy is defined as 

/;,„(/i) = Hn+i{^) - Hnifi). 

It can be shown (see |Shj ) that the quantities /i„(/i) and Hn{^)/n converge 
to the same value, which is the Shannon entropy of the process fi: 

h{fi) = lim hn{fi) = lim Hn{^i)/n. (2.3) 



2.1 The main theorem 

Intuitively, after a pair substitution the information is more concentrated, 
with respect to the original sequence. 

After several substitutions, the most important blocks (the most frequent 
ones) are concentrated into symbols and the value of the entropy can be 
calculated by applying the standard formula with short blocks {Hk with 
small k). 

This can be formulated in precise terms (see [BCGj . Theorem 3.2 and 
Corollary 12. 6p and suggests that a sequence of substitutions might asymp- 
totically transfer all the information to the distribution of the pairs and 
individual symbols. This is precisely the content of the main theorem. 

To state it, we define the following objects: 

- the alphabets Aj^ = A^^i U {a^} where a^v ^ A^-i and Aq = A; 
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- the maps Gn = G'^^y^ : A%_^ A%, where xn^Vn ^ ^iv-i; 

- the maps between measures Qn = Gx^yN' 

- the measures hn = QnI^n-i, with /io = /i; 

- the quantities Z]si = Zxj^y^ and Zn = Zn ■ ■ ■ Zi. 
Theorem 2.4. f lBU^ . Theorem 3.2] If 

hm Z^ = +00 

then 

Hp) = lim (2.4) 

N-*oo 

Theorem 2.5. // at each step N the pair x^Vn is a pair with the maximum 
frequency among all the pairs of symbols of A^^i, then 

hm Zn = +00. 

Theorems 12.41 and 12.51 combined guarantee that, by performing at each 
step the substitution of a pair with maximum probabihty, the entropy of 
the original ergodic process is approximated by the 1-st order conditional 
entropy, which takes into consideration only the distribution of the single 
symbols and of the pairs of symbols. In this sense, through this method "the 
ergodic process becomes 1-Markov in the limit" . 

In practical utilizations of the above theorem we have access to the sta- 
tistical properties of the source by measuring the empirical frequency of digit 
sequences in the experimental data we have. Given a sequence 0:10:2 . . . Xn, 
the empirical distribution of the (overlapping) /c-blocks is defined naturally 
by 

and its empirical fc-entropy is defined by 

Hk{xi) = - ^ Pk{w\x'^)log2Pk{w\x'^). 

\w\=k 

Let us call G the substitution operation on the maximal frequency pair 
(if there are more than one string of maximal frequency, the lexicographic 
order is used). By ergodicity, it is possible to rephrase the above theorem 
into a statement which is more similar to what can be pratically done on 
long strings coming from the source: 
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Corollary 2.6. // ^ is ergodic, for almost each uj G 

= li,„ H.iG'M)) -Hm)) p,6) 

where Zn{uj[) = ^^^J^^i ^| is the shortening rate after n substitutions. 

Proof. Let a; be a typical realization of the system. Since the system is er- 
godic limi^oo^fc(G"K)) = i/fc(/in), hence limi^oo^2(G"K))-^^i(G'"K]) = 
hi{fin)- Moreover, in the same way, when n is fixed and / oo, Zn{uj{) Zn 
and the corollary follows from the above Theorem I2.4[ 

□ 



3 Symbolic dynamics 

In this section we briefiy recall the basic notions about symbolic dynamics 
and Kolmogorov- Sinai entropy. We already defined the entropy of a sym- 
bolic process. Entropy may be defined also for measure-preserving trans- 
formations. This will be done by associating symbolic sequences with the 
orbits of the transformation. Let us more precisely recall the definition of 
Kolmogorov-Sinai entropy /i^ of a map T : (X, B, jj) (X, B, ji) having an 
ergodic invariant measure /x. 

Let a = {Ai,...,Ak} be a finite measurable partition of X. Let i7 
be the product space {1,2,..., k}^, so that an element of J7 is a sequence 
Lu = {uJn)'^=oy where u;„ G {1, 2, ... , k} for all n. 

It is possible to translate in a standard way the dynamics of {X, B, n, T) 
into the dynamics of the space i7, which is provided with the Borel a-algebra 
B{f2) generated by the cylinder sets and the left shift transformation a. Let 
us define a map (pa '■ {X, B) {Q, B{Q)) by 

{<Pcx{x))n = UJn if G 

SO that the n-th coordinate of 0a (x) is the alphabet letter corresponding to 
the element of the partition a which T^x belongs to. 

It holds (f)a{Tx) = a{(j)ax), Vx G X. Furthermore, the map is measur- 
able and naturally transports the measure fj, on (i7, B{il)) defined by setting 
for every measurable E CI Q, ^{E) = ix[(j)~^E). 

Notice that in general the map 0a is not invertible, thus it does not always 
give an isomorphism. However, if the partition a is generating., that is the 
sets of the form Ai-^nT~^ Ai.^n. . .nT~(™~-'^Mj^ generate the a-algebra i3, then 
the map 0^ gives an isomorphism between (X, B, fi, T) and B{f]), u, a). 
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If a and /3 are two measurable partitions of (X, /i, T), their joint 'partition 
a V /3 is the set {A ni?|y4Ga;i?G/3}. If T is a measurable and non- 
singular function and a is a partition, then r~^a is the partition defined by 
the subsets \T~^A \ A G a}. 

Given the partition a = {Ai, . . . , A^} we shall denote the Shannon en- 
tropy of the partition by 

k 
i=l 

The entropy of the map T with respect to the partition a is: 

hJT,a) = lim -Hi \/ T~'a]. 

The Kolmogorov- Sinai entropy of the dynamical system (X, /i, T) is 

h^{T) = sup h^(T, a), 

a 

where the supremum is taken over all the finite partitions. 

There exist partitions whose entropy is the Kolmogorov- Sinai entropy of 
the map. 

Theorem 3.1 (Kolmogorov). Consider a dynamical system {X,fi,T). If a 
is a generating partition with respect to the map T, then 

h^{T) = h^(T,a). 

The existence of a generating partition for a dynamical system is assured 
by the following theorem. 

Theorem 3.2 (Krieger Generator Theorem [Kr] ) . For an ergodic dynamical 
system {X,fi,T) on a Lebesgue space X, such that h^{T) < oo, there exists 
a finite generating partition a. 

The identification of a generating partition is generally a challenging task. 
In the following, we shall provide some examples of generating partitions in 
specific cases. 
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4 Estimating entropy from samples 



When a process'invariant measure is explicitly known, we could in principle 
estimate the entropy by applying the definition. On the other hand, when 
we are not given explicit knowledge of the measure, we are often not able 
to know exactly the entropy of the process and the problem of entropy esti- 
mation arises. A usual approach to this problem is considering long sample 
sequences, which are looked at as parts of infinite typical sequences and thus 
representing the statistical features of the system. To such samples several 
entropy estimating algorithms can be applied. 

We shall compare the estimating algorithm suggested by the NSRPS 
method with two others, which we shall call the empirical frequencies (briefiy, 
EF) method and the return times (briefiy, RT) method. We remark that 
these methods can be applied directly to the symbolic sequence without ha- 
ving any other information on the source. For the ergodic transformations 
of the unit interval we shall use another estimating algorithm which does 
not apply to symbolic processes: the approximation of the Lyapunov expo- 
nent. We remark that the estimation of entropy by this method uses some 
additional information on the system (the derivative of the map, which is 
calculated at each step of the dynamics, and the dimension of the invariant 
measure) . 

Each estimation algorithm is supported by rigorous results, as it will be 
shown in the following sections and will be implemented in its simplest form. 

We end remarking that, while experimental examples contained in this 
paper are long artificial trajectories mostly coming from dynamical systems. 
When working on short sequences (for instance finite realization of some bio- 
physical process or experiment), surrogate analysis and a suitable correction 
of the estimator can be useful in order to take into account fiuctuations of 
entropy or implicit bias on the chosen estimator (see e.g. |MEPR] . |BHM] ). 



4.1 Empirical frequencies (EF) 

To estimate entropy directly by the definition, a simple procedure consists 
in determining the empirical distribution pk of the overlapping fc-blocks and 
taking ^''^^''^ as an estimate for h. If k is fixed and the length of the sample 
sequence n tends to infinity, then ^^^^^"^ almost surely converges to 
which tends to /i as A; — > oo. Theorem 14.11 below guarantees that these two 
limits can be performed together with k{n) ~ log2 n. 

Given the sequence X1X2 ■ ■ -Xn, the empirical distribution Pfc(-|x^) of the 
overlapping fc-blocks is defined as in (12. 5p . 
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Theorem 4.1. If fi is an ergodic measure of entropy h > 0, if k{n) ^ oo as 
n oo and if k{n) < " , then 

For the proof and further details see [Shj . Theorem II. 3. 5 and Remark 
II. 3. 6. We remark that the same result holds for non-overlapping distribu- 
tions. The reason why we chose to consider the overlapping one is to enrich 
the statistic as much as possible, as it will be explained in section [5l 

4.2 Return times (RT) 

Ornstein and Weiss proved an interesting result which links entropy and the 
so-called return times for ergodic processes. They showed in |0W] that the 
logarithm of the waiting time until the first n terms of a sequence x occur 
again in x is almost surely asymptotic to nh. 

Definition 4.2. Given a sequence x taken from an ergodic process, we define 
the n-th return time as 

Rn{x) = mm{m > 1 : x'^l'^ = x?}. 

Theorem 4.3. // fi is an ergodic process with entropy h, then 

lim —log2 -Rn(a^) = h, a. s. 

n— >cxD 77, 

For the original proof see |OWj , for an alternative one see |Shj , Theorem 
II.5.1. 

4.3 Lyapunov exponent 

If we are interested in the estimation of the entropy of a one dimensional 
system a powerful tool is the Lyapunov exponent. 

Let us consider a map T : [0, 1] [0, 1] having an ergodic invariant 
measure /x. We define its Lyapunov exponent by 

-^M = / log2 T'd^i. 
Jo 

Under some assumptions (see below) this quantity is related to the fractal 
dimension HD{n) of and the entropy /i^ of the system by the formula 



10 



HD{fi) = Y-. Hence if we know HD{fi) and estimate numerically, we 
obtain an estimation for /i^. 

Let us give a precise statement for one dimensional systems (see |LYj for 
a generalization to multidimensional systems). A map T : [0, 1] [0, 1] is 
called piecewise monotonic if there is a sequence {ZjjjgN of disjoint open 
sub intervals of [0, 1] such that Tj^. is strictly monotone and continuous for 
each i. 

Let us consider the set Ez = rijgN^ '(UjeN^j), where all iterates of T 
are in the open intervals. Let fi be an invariant ergodic measure such that 
f^{Ez) = 1. Let us consider its Lyapunov exponent and its K-S entropy 
hfj,. Let us denote by HD{X) the Hausdorff dimension of a subset X C [0, 1]. 
The Hausdorff dimension HD{fi) of a measure n, is defined as the infimum 
HD{fi) = mf^i^x)=i{HD{X)) of the dimension of full measure sets. 



Let us consider the variation of a function / : [0, 1] 
[a, b] defined by: 



on a subinterval 



f m 

varf„,6](/) = sup <^ 5Z \f{xi^i) - f{xi)\^ 



m G N, a < a;o < • • • < < 6 



We say that the derivative of a piecewise monotonic map has bounded p- 
variation if there is a function g such that g{x) = on [0, 1] \ Ez-, g = T' on 
each Zi and varj'^ ^^{g) < oo. 



Theorem 4.4 ( |HRj ) . LetT be a map on [0, 1] with finitely many monotonic 
pieces and a derivative of bounded p-variation for some p > 0. If fi is an 
ergodic invariant measure with Lyapunov exponent \^ > 0, then 



X.. 



In many of the systems we will study we have that the invariant measure 
fi we are interested to consider is absolutely continuous with respect to the 
Lebesgue measure with a regular (bounded variation or continuous) density, 
hence HD{fi) = 1. 

The Lyapunov exponent will be then numerically estimated with a Birkhoff 
average along a typical orbit of the system, hence giving 

for /i-a.e. xq, by the ergodic theorem. Experimental results indicate that 
this limit converges very fast and gives a very good estimation for /i^. 
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5 Computer simulations 



Concerning the results of the computer simulations, some comments are due 
on the way we implemented the entropy estimating algorithms. 

• About empirical frequency estimation, in our simulations we could not 

consider blocks much longer than 23 bits. This is because the algorithm 
takes a time which grows exponentially in the length of the blocks 
considered. The empirical distribution of blocks of various lengths was 
calculated on the entire symbolic sequence. 

• The return times method was performed by calculating the return times 

of strings long up to loggn, where n is the length of the symbolic 
sequences. Moreover, in order to have more reliable results, for every 
binary sequence we considered not only the return times of the initial 
strings x\, but also of Xa"*"^, a^s"*"^, • • • , , and took the average 

of their logarithms, hence what we measure is an average return time 
indicator. 

• In the implementation of the NSRPS method, at every step the substi- 

tution with a new symbol of a pair with maximum probability was 
performed, then we calculated the conditional entropy of order 1 and 
the inverse of the mean shortening estimating the entropy according 
to Corollary 12. 6[ 

The implementation of the substitutions method did not show mean- 
ingful computational constraints, since performing a pair substitution 
requires a very short time. Nevertheless, there is one algorithmic ques- 
tion to be answered: the identification of a stop condition. 

For the estimation of the entropy with NSRPS, at the moment we have 
not an analogous of Theorem 14.11 hence we have to find how many 
substitutions it is convenient to made on a finite sample string. We 
had to understand when to stop the substitutions before the sequence 
becomes too short and consequently the statistics becomes too poor. 
We chose to stop when the following condition has occurred: 

StopCond: the substituted pair has frequency < 0.02. 

The stop condition above is somewhat artificial and has no intrinsic 
relation with the symbolic process. In all the cases we studied we knew 
the true entropy or estimated it quite precisely by means of the Lya- 
punov exponent, so that we could understand when the approximation 
through the pair substitutions method was good. In all our processes, 
for which we took symbolic samples long 15 millions bits, it seems that 
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few tens of pair substitutions are enough for the estimate to become 
more or less constant when considering the first three decimal digits. 
Obviously, when the process is independent or 1-Markov at most one 
pair substitution is needed in order to have a very precise estimate of 
the entropy. On the contrary, processes which have long memory prop- 
erties need many pair substitutions. The stop condition we used does 
not take into account the memory properties of the process, so that 
it lets the algorithm performing unnecessary pair substitutions in low- 
Markov cases and stops it before useful substitutions in long-memory 
processes. Although a threshold lower than 0.02 in StopCond could 
improve the estimates, the goal is to find some criterion, both user- 
independent and sequence-dependent, which determines for each case 
the most appropriate number of substitutions to perform. 



6 Experiments 

We now describe the transformations of the unit interval generating the sym- 
bolic sequences to which we applied the entropy estimating algorithms. 

6.1 Maps 

We considered a few maps of the interval, to which we applied the construc- 
tion explained in section [3] to obtain symbolic sequences. 

6.1.1 Piecewise expanding maps 

We considered a piecewise expanding map E, defined by 

1 if^e[i,l] ' 

which is discontinuous in ^ and has two surjective branches (see Figure [6?Ti) . 
It holds E'{x) > k for all x, where k > 1 is a constant. As it is well 
known (see e. g. [Vi]), a map of this kind has a unique absolutely continuous 
invariant measure with dimension 1. Moreover, Theorem 14.41 applies and we 
can estimate the entropy by the Lyapunov exponent. A generating partition 
for E is {[0, |[, [|, 1]} (see |Bi], Exercise 3.4). 

We show the results of the entropy estimates in Table [UTT] and Figure [UTTl 
The NSRPS method gives the best estimate. Though, the substitutions 
themselves have no particular role, since the map seems to be 1-Markov (the 
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map 








^NSRPS (-^sub) 


E 


0.8673 


0.865 


0.838 


0.867 (17) 



Table 6.1: Entropy estimates for the piecewise expanding map E. The values 
hhyap, hsF, hjiT and h^sRPS are the entropy estimates as Lyapunov expo- 
nent or by empirical frequencies, return times, NSRPS, respectively. Nsub is 
the number of pair substitutions executed when the stop condition StopCond 
occurs. 

PIECEWISE EXPANDING MAP 



map 



EF 



RT 



iQOOOOOOO 



' ' 

5 


block langW 

NSRPS 


5 20 2 





20 25 



Figure 6.1: Piecewise expanding map E and entropy estimates by means of 
empirical frequencies, return times and NSRPS. The straight line corresponds 
to the Lyapunov exponent value. 



first value calculated with the substitutions algorithm is already very close 
to the true entropy). 
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6.1.2 Lorenz-like maps 

Another example of map with two non-surjective branches is a Lorenz-hke 
map (similar maps are involved in the study of the famous Lorenz system) 
defined by 

The derivative of L is uniformly greater than 1 for all x G [0, 1] \ {|} and 
L'(i^) = +00 (see Figure EH- 

As for the previous piecewise expanding map E, the Lorenz-like map L 
has a unique absolutely continuous invariant measure with dimension 1 (see 
|Vi] ) ■ Theorem 14.41 does not apply in this case because the derivative is not 
bounded and hence has not p-bounded variation. However the usual relation 
between entropy and Lyapunov exponent holds and can be recovered by [St] . 
Moreover, the natural partition {[0, |[, [|, 1]} is generating (see again |Bu] ) . 

In Table [62] and Figure [62] the results obtained for the map L are shown. 



map 


h 

' ''Lyap 






^NSRPS (A'sub) 


L 


0.7419 


0.764 


0.723 


0.748 (17) 



Table 6.2: Entropy estimates for the Lorenz-like map L. Ngub is the number 
of pair substitutions executed when the stop condition StopCond occurs. 

The Lorenz-like map L appears not to be 1-Markov. In fact, from the plot 
relative to NSRPS in Figure [U^ it can be noticed that the best value is not the 
first estimated, that is simply the 1-st order conditional entropy hi. Instead, 
there are pair substitutions that significantly improve the approximation of 
the entropy. These substitutions are those which condense more information 
than others. Furthermore, this is one of the cases in which a few more pair 
substitutions after condition StopCond occurs give a better estimate. 

6.1.3 Logistic maps 

The logistic maps are of the form 

Axx = Xx{l-x), 1<A<4 

We took A = 4, 3.8 and 3.6 (the graph of A^ g is shown in Figure 16.31 
(map)). For all these three maps, the partition {[0, |[, [|, 1]} is generating 
(see [Hu]). 
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map 



LORENZ-LIKE MAP 

EF 




5 10 15 20 25 5 10 15 20 

Figure 6.2: Lorenz-like map L and entropy estimates by means of empirical 
frequencies, return times and NSRPS. The straight line corresponds to the 
Lyapunov exponent value. 

For A = 4 there is a unique invariant measure, which is ergodic and 
absolutely continuous with respect to Lebesgue and whose density is p(x) = 
— , ^ Furthermore, the dynamical system ([0, 1], i3([0, 1]), /9(x)dx, yl4) is 

TT-v/ x{l — x) 

isomorphic to the shift on the Bernoulli process with alphabet {0, 1} and 
parameter i. Thus, for the entropy it holds h{A4) = 1. 

About the maps yls.g and A^ q we remark that the assumptions of Theorem 
14.41 still hold and the dimension of the invariant measure is estimated to be 
very close to 1 (see |Sp| ). Hence we assume to be reasonable to estimate the 
entropy by the Lyapunov exponent. 

In Table we summarize the final entropy estimates obtained with the 
four methods for the three logistic maps, while in Figure 16.31 we show in 
graphical form the complete results for the map /Is.s- 

For the map yl4 the NSRPS method does not require any substitution 
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map 


^Lyap 






/^NSRPS V^snh) 


A, 


1.0000 


0.997 


0.959 


1.000 (17) 


^3.8 


0.6234 


0.652 


0.610 


0.628 (18) 


^.6 


0.2646 


0.348 


0.314 


0.269 (18) 



Table 6.3: Entropy estimates for the logistic maps Ax. Ngub is the number of 
pair substitutions executed when the stop condition StopCond occurs. 

LOGISTIC MAP 



map 



EF 




20 25 



RT 



NSRPS 



■""o""o""o""o""o""o"'"o""0""0""0""D"""a"""a""o 



-O— O— O— O— O- 



5 10 15 2D 25 5 ID 15 20 

block length substilulion 

Figure 6.3: Logistic map for A = 3.8 and entropy estimates by means of 
empirical frequencies, return times and NSRPS. The straight line corresponds 
to the Lyapunov exponent value. 



to correctly estimate the entropy up to the sixth decimal digit. This is no 
surprise, since the symbolic process associated with is independent. 

Instead, for the map yl3.8 it happens that, similarly to the NSRPS case 
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of the map L (see Figure there are pair substitutions which are more 
important than others in approximating the value of the entropy, as it can 
be noticed in Figure I6.3[ 

The entropy estimating algorithms give for the map yls.g results that are 
qualitatively similar to those of vls.s- 

6.1.4 Manneville-Pomeau maps 

Manneville maps exhibit dynamics with long range correlations. They are 
defined by 

M^x = X + x"^ (mod 1), 2 G IR+. 

Such maps have great interest in physics and possess different charac- 
teristics as the exponent z varies. We focused our attention on the values 
1 < z < 2, for which the maps admit a unique absolutely continuous invari- 
ant probability measure (with unbounded density). For these parameters, 
the system has power law decay of correlations, and the rate is slower and 
slower as z approaches 2 (see [Vi], section 3 e.g.). In this case the system 
has "long memory" and to estimate entropy by the empirical frequencies we 
would need long blocks. For z > 2 the absolutely continuous invariant mea- 
sure is no longer finite. We also remark that since those maps have bounded 
variation derivative, in the cases where the absolutely continuous invariant 
measure is finite we can again estimate the entropy by the Lyapunov expo- 
nent. We took values of z which go very close to 2: 2:1 = |, ^2 = I, -^3 = 
2:4 = f|, 2:5 = H, 26 = ^ (see the plot of M^^ in Figure lOl) . For all 
1 < i < 6 it holds M^x) > 1 for all x e]0, 1] and M^0+) = 1. For these 
maps the natural partitions {[0, q[, [q, 1]}, where q g]0, 1[ is that value such 
that M^. (c~) = 1 and M^Xcf) = 0, are obviously generating. 

The presence in of an indifferent fixed point is the main responsible 
for the peculiar behaviour of the Manneville maps. When, starting from 
a random point xq, after a certain number n of iterations the point M^xq 
happens to be very close to 0, the subsequent iterations remain very close to 
for a long time. This fact translates in having many consecutive zeros in 
the binary symbolic sequence associated with the orbit of Xq. These strings 
of zeros can be long even hundreds of thousands of bits or more. The closer 
to 2 is the exponent z, the longer and more frequent these strings. 

In carrying out the simulations for the Manneville maps and commenting 
their results, one cannot ignore the peculiarities of these maps. It turns out 
that the symbolic sequences we generated are too short to reflect the general 
characteristics of the maps. If in a sequence of 15 millions bits there happen 
to be groups of consecutive zeros that are hundreds of thousands of bits 
long, then the results obtained from such a sequence cannot be completely 
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reliable. The usual approach to this problem is considering many sequences, 
generated from different initial random points, and taking the averages of 
the estimates. For the map M^. we considered 2i sequences, with 1 < i < 6. 
Still, the values obtained from the various sequences are quite different, so 
that we cannot consider completely reliable the averages as well. 

Bearing in mind these considerations, we report in graphic form the re- 

31 . 

suits for the Manneville map M^^ = z + zTe (mod 1) (see Figure while 
the results for all the six Manneville maps considered are shown in Table EH 



map 


^Lyap 






^NSRPS (^sub) 


Ma 

2 


0.811 


0.804 


0.821 


0.813 (18) 


Mr 

4 


0.519 


0.522 


0.558 


0.511 (20) 


Ml5 

8 


0.314 


0.340 


0.442 


0.322 (19) 


Mai 

16 


0.228 


0.244 


0.444 


0.226 (21) 


M6a 

32 


0.175 


0.234 


0.400 


0.216 (21) 


Ml27 

64 


0.168 


0.214 


0.358 


0.196 (21) 



Table 6.4: Entropy estimates for the Manneville maps Mz^. Ngub is the aver- 
age number of pair substitutions executed when the stop condition StopCond 
occurs. 

For each Manneville map that we studied (except for M?), the entropy 
estimates obtained through the NSRPS method were clearly the closest to 
the true entropy (which we assumed to be equal to the average Lyapunov 
exponent), although they were not as close as for the other maps or processes 
(see section [6l2l) . 

6.1.5 A skew product 

We consider an example of a two dimensional system having long range 
correlations which is quite different from the Manneville map. Let us consider 
the following map S : [0, 1]^ [0, 1]^ defined by 

S{x,y) = {Ex,y + a(j){x) mod 1) (6.1) 
f 1 if X > - 

where: (h(x) = ~?,aisa diofantine irrational and E is the one 

' \ if X < ^ ' 

dimensional piecewise expanding map considered in section I6.1.1[ In the 
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map 



MANNEVILLE MAP 

EF 





Figure 6.4: Manneville map with z = and entropy estimates by means of 
empirical frequencies , return times and NSRPS. The straight line corresponds 
to the Lyapunov exponent value. 



system the x coordinate is subjected to a chaotic transformation, while the 
y is rotated according to the value of x. Such systems preserve an absolutely 
continuous invariant measure and are mixing. Some estimations for the decay 
of correlations are given in [Pol] . 

We partitioned the unit square in four equal squares Qi,...,Q4 having a 
common vertex at |). 

The entropy of S with respect to the partition {Qi, Q4} is the same as 
the entropy of E, indeed the rotation has zero entropy and a symbolic orbit 
for the two dimensional system can be constructed by the information given 
by its symbolic orbit for the one dimensional map E and the information 
relative to the rotation part. Although the entropy is the same, its estimation 
is much more complicated, as the experiments show. 

In figure [631 we consider the case where the golden ratio. The 
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empirical frequencies and the substitutions seem to converge to a value which 
is slightly greater than the true entropy. The return time instead seems to 
better approximate the entropy in this case. 

SKEW PRODUCT 



EF 



RT 



NSRPS 



Figure 6.5: Results for the skew product S: entropy estimates by means of 
empirical frequencies, return times and NSRPS. The straight line corresponds 
to the entropy value. 



6.2 Renewal processes 

Apart from the symbolic sequences obtained from ergodic transformations of 
the unit interval, we considered sequences taken from the so-called renewal 
processes. 

A renewal process is a stationary process with alphabet {0, 1} for which 
the distances between consecutive ones are independent and identically dis- 
tributed random variables. When a symbol '1' occurs, the sequence forgets 
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all its past and the probability of having the next '1' after j bits is pj, where 
<Pj <l and ^ Pj = 1. 

We considered such renewal processes, with pi = P2 = ■ ■ ■ = = 
5 < k < 9, which we shall indicate with RP2k. 

For these renewal processes, the value of their entropy can be calculated 
exactly. We recall in fact that the entropy of a process is the number of bits 
per symbol that are necessary to describe the process itself. The quantity 
C = — J2j Pj log2 Pj represents the number of bits that one needs to describe 
the process of the jumps between consecutive ones. In other words, C is the 
entropy of a random variable which describes the length of the jumps. If n is 
large, with n jumps {nC bits) we describe a sequence long about nL symbols, 
where L is the average length of the jumps. Thus, 



nL 



In our cases, where pi 



T.j>iJPj 

= P2* = ^ and Pj = for j > 2^, we have 



2k 



2'' + 1 



In Table 16.51 we show the results of the entropy estimates for the renewal 
processes RP2k and those of RP32 are also plotted in Figure | 



map 


h 






^NSRPS (Nsuh) 


RP32 


0.303030 


0.320 


0.272 


0.303067 (11) 


RPq4 


0.184615 


0.196 


0.153 


0.184793 (22) 


RP128 


0.108527 


0.115 


0.110 


0.108498 (25) 


RP256 


0.062257 


0.066 


0.055 


0.062239 (18) 


RP512 


0.035088 


0.037 


0.039 


0.035112 (16) 



Table 6.5: Entropy estimates for the renewal processes RP2k. Nsut is the 
number of pair substitutions executed when the stop condition StopCond oc- 
curs. 

For this process, the substitutions method gives an excellent approxima- 
tion of the entropy already after five pair substitutions. After these substitu- 
tions all the memory of the process has been transferred to the distribution 
of the pairs, so that the sequence has become 1-Markov. 
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RENEWAL PROCESS 



EF 



RT 



°OOOOOOOOOOOOOOOOOOOOOi 



20 25 



NSRPS 




Figure 6.6: Results for the renewal process entropy estimates by means 

of empirical frequencies, return times and NSRPS. The straight line corre- 
sponds to the entropy value. 



7 Conclusions and final remarks 

The performance of the three symbohc methods is summarized in Figure mi 
Summarizing, NSRPS results to be the method that best approximates the 
entropy value. To this aim, it is a fast and light computational tool that may 
be used also for systems having low entropy or long range correlations where 
other statistical methods fail. 

This paper shows for the first time a comparison in entropy estimation 
among NSRPS and other well-known methods. The results also open some 
further questions about NSRPS: 

• how to prove an analogous of Theorem 14. II for NSRPS giving a sufficient 
number of substitutions in function of the length of the string? 
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1.2 



EXPANDING 



LORENZiIKE 



SKEW PRODUCT 



LOGISTIC 
■ 

- -B - ■ 



RENEWAL k-S 



MANNEVILLE z_4 



I 1 1 1 ' ' 1 

5 10 15 20 

Figure 7.1: Entropy estimates for the maps A^ g, E, L, Msi and the renewal 

16 

process RP32: symbol empty □ refers to the NSRPS value under StopCond 
condition; full □ refers to the EE value and o refers to the RT method. 
Straight lines show the entropy values (for the maps they are the estimated 
Lyapunov exponents). 



• are there other meaningful substitution methods (different from the 
recipe given in Theorem I2.5p that may be proved to be (at least) suffi- 
cient for Theorem 12.41 to hold? 

• can the joint use of NSRPS and Lyapunov exponent (which are both 
fast converging and fastly computable) together with Theorem 14.41 give 
a particularly good method to numerically estimate the Hausdorff di- 
mension of an attractor? 

• concerning the applications of NSRPS to non-artificial processes, such 
as literary texts, biological sequences (DNA, proteins) and time series 
in general, what interesting features of the driving dynamics may be 
extracted? 

• NSRPS method might be the core of some data compression algorithm 
(see |LaMoj ). This should pave the way to some investigations towards 
its compression capabilities in comparison with other well-known algo- 
rithms. We remark that data compression procedures have also been 
successfully used as entropy estimators (see e.g. [BBGMV] and |GKBj ). 
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